An Overview on Data Mining

نویسنده

  • Saurabh Gupta
چکیده

In this paper, we present an overview of research issues in Data mining. We discuss mining with respect to web data referred here as web data mining. In particular, our focus is on web data mining research in context of our web Data Mining project called ‘A Study on Web Data Mining’. We have categorized web data mining into three areas; web content mining, web structure mining and web usage mining. We have highlighted and discussed various research issues involved in each of these web data mining category. We believe that web data mining will be the topic of exploratory research in near future. INTRODUCTION The advent of the World Wide Web has caused a dramatic increase in the usage of the Internet. The World Wide Web is a broadcast medium where a wide range of information can be obtained at a low cost. Information on the WWW is important not only to individual users, but also to the business organizations especially when the critical decision-making is concerned. Most users obtain WWW information using a combination of search engines and browser; however, these two types of retrieval mechanisms do not necessarily address all of a user’s information needs. This is particularly true in the case of business organizations that currently lack suitable tools to systematically harness strategic information from the web and analyze these data to discover useful knowledge to support decision making. A recent study provides a comprehensive and comparative evaluation of the most popular search engines. A more recent survey of web query processing has appeared. The resulting growth in on-line information combined with the almost unstructured web data necessitates the development of powerful yet computationally efficient web data mining tools. Web data mining can be defined as the discovery and analysis of useful information from the WWW data. Web involves three types of data; data on the WWW, the web log data regarding the users who browsed the web pages and the web structure data. WHY DATA MINING? The Web—an immense and dynamic collection of pages that includes countless hyperlinks and huge volumes of access and usage information—provides a rich and unprecedented data mining source. However, the Web also poses several challenges to effective resource and knowledge discovery: • Web page complexity far exceeds the complexity of any traditional text document collection. Although the Web functions as a huge digital library, the pages themselves lack a uniform structure and contain far more authoring style and content variations than any set of books or traditional text-based documents. Moreover, the tremendous number of documents in this digital library has not been indexed, which makes searching the data it contains extremely difficult. • The Web constitutes a highly dynamic information source. Not only does the Web continue to grow rapidly, the information it holds also receives constant updates. News, stock market, service center, and corporate sites revise their Web pages regularly. Linkage information and access records also undergo frequent updates. • The Web serves a broad spectrum of user communities. The Internet’s rapidly expanding user community connects millions of workstations. These users have markedly different backgrounds, interests, and usage purposes. Many lack good knowledge of the information network’s structure, are unaware of a particular search’s heavy cost, frequently get lost within the Web’s ocean of information, and can chafe at the many access hops and lengthy waits required to retrieve search results. • Only a small portion of the Web’s pages contain truly relevant or useful information. A given user generally focuses on only a tiny portion of the Web, dismissing the rest as uninteresting data that serves only to swamp the desired search results. How can a search identify that portion of the Web that is truly relevant to one user’s interests? How can a search find high-quality Web pages on a specified topic? DATA MINING Data mining is the process of discovering actionable information from large sets of data. Data mining uses mathematical analysis to derive patterns and trends that exist in data. Typically, these patterns cannot be discovered by traditional data exploration because the relationships are too complex or because there is too much data. Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. Data mining is also known as Knowledge Discovery in Data (KDD). The key properties of data mining are:  Automatic discovery of patterns  Prediction of likely outcomes  Creation of actionable information  Focus on large data sets and databases Vishal et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014, 1269-1272

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ancient Gold Mining Activities in India - An Overview

Gold was obtained through washing or panning of the river sands during initial periods of civilisation. With the advent of knowledge of metallurgical processing of ores it was recovered through mining of in-situ quartz reefs, and then from auriferous sulphide ores. The metal mining activities are evidenced in the form of large number of ‘ancient metal mines’ or ‘old workings’ and ‘placer mining...

متن کامل

An Overview of Nonlinear Spectral Unmixing Methods in the Processing of Hyperspectral Data

The hyperspectral imagery provides images in hundreds of spectral bands within different wavelength regions. This technology has increasingly applied in different fields of earth sciences, such as minerals exploration, environmental monitoring, agriculture, urban science, and planetary remote sensing. However, despite the ability of these data to detect surface features, the measured spectrum i...

متن کامل

Mining Different Kinds of Trees: A Tree Mining Overview

This paper surveys recent work on tree pattern mining and provides an overview of the state of the art in this emerging data mining field. A handful of noteworthy tree pattern mining algorithms are analyzed and they are compared from different points of view in order to highlight their similarities and recognize their dissimilarities.

متن کامل

Web Content Mining Techniques and Tools

with the spectacularly and unpredictable growth of information available over the Internet, WWW has become a powerful platform for storage & retrieval of information. Due to the heterogeneity & unstructured nature of data on WWW, searching of information is becoming cumbersome & time consuming task. Web mining came as solution for the above problem. Web mining is the means of utilizing data min...

متن کامل

An Overview of Associative Classifiers

Associative classification is a new classification approach integrating association mining and classification. It becomes a significant tool for knowledge discovery and data mining. This paper presents a critical overview of certain aspects on modelling associative classifiers including event association mining, computational complexity in mining associations, methods of constructing associativ...

متن کامل

Multiagent Data Mining Systems: An Overview

Data mining technology has emerged as a means for identifying patterns and trends from large quantities of data. The Data Mining technology normally adopts data integration method to generate Data warehouse, on which to gather all data into a central site, and then run an algorithm against that data to extract the useful Module Prediction and knowledge evaluation. However, a single data-mining ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014